16 research outputs found
Fast Context Adaptation via Meta-Learning
We propose CAVIA for meta-learning, a simple extension to MAML that is less
prone to meta-overfitting, easier to parallelise, and more interpretable. CAVIA
partitions the model parameters into two parts: context parameters that serve
as additional input to the model and are adapted on individual tasks, and
shared parameters that are meta-trained and shared across tasks. At test time,
only the context parameters are updated, leading to a low-dimensional task
representation. We show empirically that CAVIA outperforms MAML for regression,
classification, and reinforcement learning. Our experiments also highlight
weaknesses in current benchmarks, in that the amount of adaptation needed in
some cases is small.Comment: Published at the International Conference on Machine Learning (ICML)
201
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
In multi-objective decision planning and learning, much attention is paid to
producing optimal solution sets that contain an optimal policy for every
possible user preference profile. We argue that the step that follows, i.e,
determining which policy to execute by maximising the user's intrinsic utility
function over this (possibly infinite) set, is under-studied. This paper aims
to fill this gap. We build on previous work on Gaussian processes and pairwise
comparisons for preference modelling, extend it to the multi-objective decision
support scenario, and propose new ordered preference elicitation strategies
based on ranking and clustering. Our main contribution is an in-depth
evaluation of these strategies using computer and human-based experiments. We
show that our proposed elicitation strategies outperform the currently used
pairwise methods, and found that users prefer ranking most. Our experiments
further show that utilising monotonicity information in GPs by using a linear
prior mean at the start and virtual comparisons to the nadir and ideal points,
increases performance. We demonstrate our decision support framework in a
real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at
https://github.com/lmzintgraf/gp_pref_elici
Quality Assessment of MORL Algorithms: A Utility-Based Approach
Sequential decision-making problems with multiple objectives occur often in practice. In such settings, the utility of a policy depends on how the user values different trade-offs between the objectives. Such valuations can be expressed by a so-called scalarisation function. However, the exact scalarisation function can be unknown when the agents should learn or plan. Therefore, instead of a single solution, the agents aim to produce a solution set that contains an optimal solution for all possible scalarisations. Because it is often not possible to produce an exact solution set, many algorithms have been proposed that produce approximate solution sets instead. We argue that when comparing these algorithms we should do so on the basis of user utility, and on a wide range of problems. In practice however, comparison of the quality of these algorithms have typically been done with only a few limited benchmarks and metrics that do not directly express the utility for the user. In this paper, we propose two metrics that express either the expected utility, or the maximal utility loss with respect to the optimal solution set. Furthermore, we propose a generalised benchmark in order to compare algorithms more reliably
A Practical Guide to Multi-Objective Reinforcement Learning and Planning
Real-world decision-making tasks are generally complex, requiring trade-offs
between multiple, often conflicting, objectives. Despite this, the majority of
research in reinforcement learning and decision-theoretic planning either
assumes only a single objective, or that multiple objectives can be adequately
handled via a simple linear combination. Such approaches may oversimplify the
underlying problem and hence produce suboptimal results. This paper serves as a
guide to the application of multi-objective methods to difficult problems, and
is aimed at researchers who are already familiar with single-objective
reinforcement learning and planning methods who wish to adopt a multi-objective
perspective on their research, as well as practitioners who encounter
multi-objective decision problems in practice. It identifies the factors that
may influence the nature of the desired solution, and illustrates by example
how these influence the design of multi-objective decision-making systems for
complex problems
Interactive multi-objective reinforcement learning in multi-armed bandits for any utility function
In interactive multi-objective reinforcement learning (MORL), an agent has to simultaneously learn about the environment and the preferences of the user, in order to quickly zoom in on those decisions that are likely to be preferred by the user. In this paper we study interactive MORL in the context of multi-objective multi-armed bandits. Contrary to earlier approaches to interactive MORL, we do not make stringent assumptions about the utility functions of the user, but allow for non-linear preferences. We propose a new approach called Gaussian-process Utility Thompson Sampling (GUTS), which employs non-parametric Bayesian learning to allow any type of utility function, exploits monotonicity information, and limits the number of queries posed to the user by ensuring that questions are statistically significant. We show empirically that, in contrast to earlier methods, GUTS can learn non-linear preferences, and that the regret and number of queries posed to the user are highly sub-linear in the number of arm pulls
Interactive multi-objective reinforcement learning in multi-armed bandits for any utility function
In interactive multi-objective reinforcement learning (MORL), an agent has to simultaneously learn about the environment and the preferences of the user, in order to quickly zoom in on those decisions that are likely to be preferred by the user. In this paper we study interactive MORL in the context of multi-objective multi-armed bandits. Contrary to earlier approaches to interactive MORL, we do not make stringent assumptions about the utility functions of the user, but allow for non-linear preferences. We propose a new approach called Gaussian-process Utility Thompson Sampling (GUTS), which employs non-parametric Bayesian learning to allow any type of utility function, exploits monotonicity information, and limits the number of queries posed to the user by ensuring that questions are statistically significant. We show empirically that, in contrast to earlier methods, GUTS can learn non-linear preferences, and that the regret and number of queries posed to the user are highly sub-linear in the number of arm pulls
Visualizing Deep Neural Network Decisions
No abstract available
Visualizing Deep Neural Network Decisions: Prediction Difference Analysis
This article presents the prediction difference analysis method for visualizing the response of a deep neural network to a specific input. When classifying images, the method highlights areas in a given input image that provide evidence for or against a certain class. It overcomes several shortcoming of previous methods and provides great additional insight into the decision making process of classifiers. Making neural network decisions interpretable through visualization is important both to improve models and to accelerate the adoption of black-box classifiers in application areas such as medicine. We illustrate the method in experiments on natural images (ImageNet data), as well as medical images (MRI brain scans)